Using a data sample from Tongji Hospital in Wuhan from January 10 to February 18, 2020, the study determined the most relevant parameters to estimate the probability of death for a patient suspected of having COVID-19 virus. The values of the following blood indicators were identified as the parameters most strongly correlated with the probability of death:
The age of the patients is also correlated with patients survivability, however correlation does not imply causation, so therefore the age was decided not to be used in the analysis.
The conducted analysis confirms the key parameter conculsions stated by Li Yan et al. in the article „An interpretable mortality prediction model for COVID-19 patients”. The importance of Lactate dehydrogenase, High sensivity C-reactive protein and Lymphocyte levels was also confirmed in the data presented in the report.
Wykorzystując próbkę danych ze szpitala Tongji w Wuhan z okresu od 10 stycznia do 18 lutego 2020 r., w badaniu określono najistotniejsze parametry pozwalające oszacować prawdopodobieństwo zgonu pacjenta podejrzanego o zakażenie wirusem COVID-19. Wartości następujących wskaźników krwi zostały zidentyfikowane jako parametry najsilniej skorelowane z prawdopodobieństwem zgonu:
Wiek pacjentów jest również skorelowany z przeżywalnością pacjentów, jednak korelacja nie oznacza związku przyczynowego, dlatego też zdecydowano się nie uwzględniać wieku w analizie.
Przeprowadzona analiza potwierdza kluczowe zbieżności parametrów podane przez Li Yan i wsp. w artykule "“An interpretable mortality prediction model for COVID-19 patients”. Znaczenie dehydrogenazy mleczanowej, białka C-reaktywnego o wysokiej czułości oraz poziomu limfocytów zostało również potwierdzone w danych przedstawionych w raporcie.
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19041)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250
## [3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C
## [5] LC_TIME=Polish_Poland.1250
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] caret_6.0-88 lattice_0.20-44 plotly_4.9.3 ggplot2_3.3.3
## [5] rmarkdown_2.8 knitr_1.33 zoo_1.8-9 readxl_1.3.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 lubridate_1.7.10 tidyr_1.1.3
## [4] class_7.3-19 digest_0.6.27 ipred_0.9-11
## [7] foreach_1.5.1 utf8_1.2.1 R6_2.5.0
## [10] cellranger_1.1.0 plyr_1.8.6 stats4_4.1.0
## [13] evaluate_0.14 httr_1.4.2 pillar_1.6.1
## [16] rlang_0.4.11 lazyeval_0.2.2 data.table_1.14.0
## [19] rpart_4.1-15 Matrix_1.3-3 splines_4.1.0
## [22] gower_0.2.2 stringr_1.4.0 htmlwidgets_1.5.3
## [25] munsell_0.5.0 compiler_4.1.0 xfun_0.23
## [28] pkgconfig_2.0.3 htmltools_0.5.1.1 nnet_7.3-16
## [31] tidyselect_1.1.1 tibble_3.1.2 prodlim_2019.11.13
## [34] codetools_0.2-18 fansi_0.5.0 viridisLite_0.4.0
## [37] crayon_1.4.1 dplyr_1.0.6 withr_2.4.2
## [40] MASS_7.3-54 recipes_0.1.16 ModelMetrics_1.2.2.2
## [43] grid_4.1.0 nlme_3.1-152 jsonlite_1.7.2
## [46] gtable_0.3.0 lifecycle_1.0.0 magrittr_2.0.1
## [49] pROC_1.17.0.1 scales_1.1.1 stringi_1.6.1
## [52] reshape2_1.4.4 timeDate_3043.102 ellipsis_0.3.2
## [55] generics_0.1.0 vctrs_0.3.8 lava_1.6.9
## [58] iterators_1.0.13 tools_4.1.0 glue_1.4.2
## [61] purrr_0.3.4 survival_3.2-11 yaml_2.2.1
## [64] colorspace_2.0-1
rm(list=ls())
setwd("C:/Users/adamc/OneDrive/Desktop/Studia Podyplomowe/Projekt R")
coviddata <- read_excel("wuhan_blood_sample_data_Jan_Feb_2020.xlsx")
## New names:
## * `` -> ...1
colnames(coviddata)[1] <- "Patient ID"
colnames(coviddata)[2] <- "Date of entry"
colnames(coviddata)[3] <- "Age"
colnames(coviddata)[4] <- "Gender"
colnames(coviddata)[7] <- "Outcome"
colnames(coviddata)[9] <- "Hemoglobin"
colnames(coviddata)[12] <- "Procalcitonin"
colnames(coviddata)[13] <- "Eosinophils"
colnames(coviddata)[16] <- "Albumin"
colnames(coviddata)[17] <- "Basophil"
colnames(coviddata)[21] <- "Monocytes"
colnames(coviddata)[22] <- "Antithrombin"
colnames(coviddata)[24] <- "Indirect bilirubin"
colnames(coviddata)[26] <- "Neutrophils"
colnames(coviddata)[27] <- "Total protein"
colnames(coviddata)[31] <- "Mean corpuscular volume"
colnames(coviddata)[32] <- "Hematocrit"
colnames(coviddata)[33] <- "White blood cell count"
colnames(coviddata)[34] <- "Tumor necrosis factor alpha"
colnames(coviddata)[35] <- "Mean corpuscular hemoglobin concentration"
colnames(coviddata)[36] <- "Fibrinogen"
colnames(coviddata)[39] <- "Lymphocyte count"
colnames(coviddata)[45] <- "Glucose"
colnames(coviddata)[46] <- "Neutrophils count"
colnames(coviddata)[49] <- "Ferritin"
colnames(coviddata)[52] <- "Lymphocyte"
colnames(coviddata)[56] <- "Aspartate aminotransferase"
colnames(coviddata)[59] <- "Calcium"
colnames(coviddata)[62] <- "Platelet large cell ratio"
colnames(coviddata)[65] <- "Monocytes count"
colnames(coviddata)[67] <- "Globuline"
colnames(coviddata)[68] <- "Gamma-glutamyl transpeptidase"
colnames(coviddata)[70] <- "Basophil count"
colnames(coviddata)[72] <- "Mean corpuscular hemoglobin"
colnames(coviddata)[76] <- "Serum sodium"
colnames(coviddata)[77] <- "Thrombocytocrit"
colnames(coviddata)[79] <- "Glutamic-pyruvid transaminase"
colnames(coviddata)[81] <- "Creatinine"
print(paste("Number of gathered inputs:",nrow(coviddata),sep=" "))
## [1] "Number of gathered inputs: 6120"
print(paste("Number of analyzed patients:",max(coviddata$`Patient ID`),sep=" "))
## [1] "Number of analyzed patients: 375"
print(paste("Lowest patients' age:",min(coviddata$Age),sep=" "))
## [1] "Lowest patients' age: 18"
print(paste("Mean patients' age:",mean(coviddata$Age),sep=" "))
## [1] "Mean patients' age: 59.4433006535948"
print(paste("Highest patients' age:",max(coviddata$Age),sep=" "))
## [1] "Highest patients' age: 95"
## [1] "Patients age"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 47.00 62.00 59.44 71.00 95.00
## Hypersensitive cardiac troponinI Hemoglobin Serum chloride
## Min. : 1.9 Min. : 6.4 Min. : 71.50
## 1st Qu.: 4.4 1st Qu.:113.0 1st Qu.: 99.05
## Median : 20.6 Median :125.0 Median :102.10
## Mean : 1223.2 Mean :123.1 Mean :103.14
## 3rd Qu.: 223.8 3rd Qu.:137.0 3rd Qu.:105.65
## Max. :50000.0 Max. :178.0 Max. :140.40
## NA's :5613 NA's :5145 NA's :5145
## Prothrombin time Procalcitonin Eosinophils Interleukin 2 receptor
## Min. : 11.50 Min. : 0.020 Min. :0.000 Min. : 61.0
## 1st Qu.: 13.60 1st Qu.: 0.040 1st Qu.:0.000 1st Qu.: 459.5
## Median : 14.80 Median : 0.100 Median :0.100 Median : 676.5
## Mean : 16.68 Mean : 1.107 Mean :0.629 Mean : 907.2
## 3rd Qu.: 16.70 3rd Qu.: 0.405 3rd Qu.:0.800 3rd Qu.:1155.5
## Max. :120.00 Max. :57.170 Max. :8.600 Max. :7500.0
## NA's :5458 NA's :5661 NA's :5163 NA's :5852
## Alkaline phosphatase Albumin Basophil Interleukin 10
## Min. : 17.00 Min. :13.60 Min. :0.00 Min. : 5.00
## 1st Qu.: 54.00 1st Qu.:27.40 1st Qu.:0.10 1st Qu.: 5.00
## Median : 69.50 Median :32.20 Median :0.20 Median : 5.90
## Mean : 82.47 Mean :32.01 Mean :0.21 Mean : 16.07
## 3rd Qu.: 95.00 3rd Qu.:36.60 3rd Qu.:0.30 3rd Qu.: 12.35
## Max. :620.00 Max. :48.60 Max. :1.70 Max. :1000.00
## NA's :5190 NA's :5186 NA's :5163 NA's :5853
## Total bilirubin Platelet count Monocytes Antithrombin
## Min. : 2.50 Min. : -1.0 Min. : 0.300 Min. : 20.00
## 1st Qu.: 7.40 1st Qu.:109.0 1st Qu.: 2.800 1st Qu.: 74.00
## Median : 10.70 Median :178.0 Median : 5.700 Median : 86.00
## Mean : 16.70 Mean :184.3 Mean : 6.155 Mean : 85.32
## 3rd Qu.: 16.77 3rd Qu.:248.0 3rd Qu.: 8.600 3rd Qu.: 97.00
## Max. :505.70 Max. :558.0 Max. :53.000 Max. :136.00
## NA's :5190 NA's :5163 NA's :5162 NA's :5790
## Interleukin 8 Indirect bilirubin Red blood cell distribution width
## Min. : 5.000 Min. : 0.100 Min. :10.60
## 1st Qu.: 8.675 1st Qu.: 3.800 1st Qu.:12.00
## Median : 16.000 Median : 5.400 Median :12.60
## Mean : 83.088 Mean : 6.889 Mean :13.07
## 3rd Qu.: 35.200 3rd Qu.: 8.000 3rd Qu.:13.70
## Max. :6795.000 Max. :145.100 Max. :27.10
## NA's :5852 NA's :5214 NA's :5197
## Neutrophils Total protein Quantification of Treponema pallidum antibodies
## Min. : 1.7 Min. :31.80 Min. : 0.020
## 1st Qu.:65.1 1st Qu.:61.00 1st Qu.: 0.040
## Median :82.4 Median :65.90 Median : 0.050
## Mean :77.6 Mean :65.30 Mean : 0.132
## 3rd Qu.:92.3 3rd Qu.:70.45 3rd Qu.: 0.070
## Max. :98.9 Max. :88.70 Max. :11.950
## NA's :5163 NA's :5189 NA's :5841
## Prothrombin activity HBsAg Mean corpuscular volume Hematocrit
## Min. : 6.00 Min. : 0.000 Min. : 61.60 Min. :14.50
## 1st Qu.: 65.00 1st Qu.: 0.000 1st Qu.: 86.90 1st Qu.:33.50
## Median : 81.00 Median : 0.010 Median : 90.10 Median :36.60
## Mean : 78.55 Mean : 8.306 Mean : 90.39 Mean :36.55
## 3rd Qu.: 95.00 3rd Qu.: 0.010 3rd Qu.: 93.90 3rd Qu.:39.90
## Max. :142.00 Max. :250.000 Max. :118.90 Max. :52.30
## NA's :5461 NA's :5841 NA's :5163 NA's :5163
## White blood cell count Tumor necrosis factor alpha
## Min. : 0.13 Min. : 4.00
## 1st Qu.: 4.94 1st Qu.: 6.70
## Median : 7.72 Median : 8.60
## Mean : 15.60 Mean : 11.58
## 3rd Qu.: 12.72 3rd Qu.: 11.50
## Max. :1726.60 Max. :168.00
## NA's :4993 NA's :5852
## Mean corpuscular hemoglobin concentration Fibrinogen Interleukin 1ß
## Min. :286.0 Min. : 0.500 Min. : 5.00
## 1st Qu.:333.0 1st Qu.: 3.050 1st Qu.: 5.00
## Median :343.0 Median : 4.120 Median : 5.00
## Mean :342.8 Mean : 4.294 Mean : 6.51
## 3rd Qu.:350.0 3rd Qu.: 5.480 3rd Qu.: 5.00
## Max. :514.0 Max. :10.780 Max. :88.50
## NA's :5163 NA's :5554 NA's :5852
## Urea Lymphocyte count PH value Red blood cell count
## Min. : 0.800 Min. : 0.000 Min. :5.000 Min. : 0.100
## 1st Qu.: 4.000 1st Qu.: 0.460 1st Qu.:6.000 1st Qu.: 3.680
## Median : 5.985 Median : 0.800 Median :6.500 Median : 4.140
## Mean : 9.589 Mean : 1.017 Mean :6.484 Mean : 9.288
## 3rd Qu.:11.400 3rd Qu.: 1.310 3rd Qu.:7.294 3rd Qu.: 4.650
## Max. :68.400 Max. :52.420 Max. :7.565 Max. :749.500
## NA's :5184 NA's :5163 NA's :5736 NA's :4993
## Eosinophil count Corrected calcium Serum potassium Glucose
## Min. :0.000 Min. :1.650 Min. : 2.760 Min. : 1.000
## 1st Qu.:0.000 1st Qu.:2.270 1st Qu.: 3.950 1st Qu.: 5.550
## Median :0.010 Median :2.360 Median : 4.410 Median : 6.990
## Mean :0.039 Mean :2.355 Mean : 4.509 Mean : 8.889
## 3rd Qu.:0.060 3rd Qu.:2.440 3rd Qu.: 4.870 3rd Qu.:10.260
## Max. :0.490 Max. :2.790 Max. :12.800 Max. :43.010
## NA's :5163 NA's :5206 NA's :5140 NA's :5345
## Neutrophils count Direct bilirubin Mean platelet volume Ferritin
## Min. : 0.06 Min. : 1.600 Min. : 8.50 Min. : 17.8
## 1st Qu.: 3.09 1st Qu.: 3.225 1st Qu.:10.10 1st Qu.: 377.2
## Median : 5.85 Median : 4.800 Median :10.80 Median : 711.0
## Mean : 7.81 Mean : 9.887 Mean :10.91 Mean : 1379.1
## 3rd Qu.:10.95 3rd Qu.: 8.275 3rd Qu.:11.50 3rd Qu.: 1425.2
## Max. :33.88 Max. :360.600 Max. :15.00 Max. :50000.0
## NA's :5163 NA's :5190 NA's :5258 NA's :5837
## RBC distribution width SD Thrombin time Lymphocyte
## Min. : 31.30 Min. : 13.00 Min. : 0.000
## 1st Qu.: 38.50 1st Qu.: 15.60 1st Qu.: 3.925
## Median : 40.90 Median : 16.80 Median :11.450
## Mean : 42.44 Mean : 18.17 Mean :15.392
## 3rd Qu.: 44.70 3rd Qu.: 18.38 3rd Qu.:24.975
## Max. :113.30 Max. :161.90 Max. :60.000
## NA's :5197 NA's :5554 NA's :5162
## HCV antibody quantification D-D dimer Total cholesterol
## Min. :0.020 Min. : 0.210 Min. :0.100
## 1st Qu.:0.040 1st Qu.: 0.603 1st Qu.:3.010
## Median :0.060 Median : 2.155 Median :3.630
## Mean :0.117 Mean : 7.943 Mean :3.689
## 3rd Qu.:0.090 3rd Qu.:21.000 3rd Qu.:4.265
## Max. :2.090 Max. :60.000 Max. :7.300
## NA's :5841 NA's :5490 NA's :5189
## Aspartate aminotransferase Uric acid HCO3- Calcium
## Min. : 6.00 Min. : 43.0 Min. : 6.30 Min. :1.170
## 1st Qu.: 19.50 1st Qu.: 183.2 1st Qu.:21.00 1st Qu.:1.980
## Median : 27.00 Median : 243.7 Median :23.50 Median :2.080
## Mean : 46.53 Mean : 276.1 Mean :23.14 Mean :2.078
## 3rd Qu.: 42.00 3rd Qu.: 333.8 3rd Qu.:25.90 3rd Qu.:2.190
## Max. :1858.00 Max. :1176.0 Max. :36.30 Max. :2.620
## NA's :5185 NA's :5186 NA's :5186 NA's :5141
## Amino-terminal brain natriuretic peptide precursor(NT-proBNP)
## Min. : 5
## 1st Qu.: 150
## Median : 585
## Mean : 3669
## 3rd Qu.: 2625
## Max. :70000
## NA's :5645
## Lactate dehydrogenase Platelet large cell ratio Interleukin 6
## Min. : 110.0 Min. :11.20 Min. : 1.500
## 1st Qu.: 218.0 1st Qu.:25.60 1st Qu.: 4.772
## Median : 340.0 Median :30.90 Median : 19.265
## Mean : 474.2 Mean :31.77 Mean : 112.308
## 3rd Qu.: 601.8 3rd Qu.:37.20 3rd Qu.: 60.167
## Max. :1867.0 Max. :62.20 Max. :5000.000
## NA's :5186 NA's :5258 NA's :5848
## Fibrin degradation products Monocytes count PLT distribution width
## Min. : 4.00 Min. : 0.010 Min. : 8.00
## 1st Qu.: 4.00 1st Qu.: 0.270 1st Qu.:11.10
## Median : 17.90 Median : 0.410 Median :12.40
## Mean : 61.35 Mean : 0.526 Mean :13.01
## 3rd Qu.:150.00 3rd Qu.: 0.580 3rd Qu.:14.30
## Max. :190.80 Max. :39.920 Max. :25.30
## NA's :5790 NA's :5163 NA's :5258
## Globuline Gamma-glutamyl transpeptidase International standard ratio
## Min. :10.10 Min. : 3.00 Min. : 0.840
## 1st Qu.:29.70 1st Qu.: 22.00 1st Qu.: 1.030
## Median :32.70 Median : 34.00 Median : 1.140
## Mean :33.24 Mean : 55.34 Mean : 1.313
## 3rd Qu.:36.50 3rd Qu.: 58.00 3rd Qu.: 1.330
## Max. :50.60 Max. :732.00 Max. :13.480
## NA's :5190 NA's :5190 NA's :5461
## Basophil count 2019-nCoV nucleic acid detection Mean corpuscular hemoglobin
## Min. :0.000 Min. :-1 Min. :20.4
## 1st Qu.:0.010 1st Qu.:-1 1st Qu.:29.7
## Median :0.010 Median :-1 Median :30.9
## Mean :0.017 Mean :-1 Mean :31.0
## 3rd Qu.:0.020 3rd Qu.:-1 3rd Qu.:32.2
## Max. :0.120 Max. :-1 Max. :50.8
## NA's :5163 NA's :5619 NA's :5163
## Activation of partial thromboplastin time High sensitivity C-reactive protein
## Min. : 21.80 Min. : 0.10
## 1st Qu.: 35.30 1st Qu.: 5.70
## Median : 39.20 Median : 51.50
## Mean : 41.52 Mean : 76.24
## 3rd Qu.: 44.12 3rd Qu.:118.50
## Max. :144.00 Max. :320.00
## NA's :5552 NA's :5383
## HIV antibody quantification Serum sodium Thrombocytocrit ESR
## Min. :0.05 Min. :115.4 Min. :0.010 Min. : 1.00
## 1st Qu.:0.07 1st Qu.:137.7 1st Qu.:0.150 1st Qu.: 14.00
## Median :0.09 Median :140.4 Median :0.210 Median : 28.00
## Mean :0.10 Mean :141.6 Mean :0.212 Mean : 33.69
## 3rd Qu.:0.11 3rd Qu.:143.5 3rd Qu.:0.270 3rd Qu.: 45.50
## Max. :0.27 Max. :179.7 Max. :0.510 Max. :110.00
## NA's :5842 NA's :5145 NA's :5258 NA's :5737
## Glutamic-pyruvid transaminase eGFR Creatinine
## Min. : 5.00 Min. : 2.00 Min. : 11.00
## 1st Qu.: 16.00 1st Qu.: 63.58 1st Qu.: 58.00
## Median : 24.00 Median : 87.90 Median : 76.00
## Mean : 38.86 Mean : 81.56 Mean : 109.93
## 3rd Qu.: 41.00 3rd Qu.:103.97 3rd Qu.: 98.25
## Max. :1600.00 Max. :224.00 Max. :1497.00
## NA's :5189 NA's :5184 NA's :5184
pearsoncor <- matrix(data=NA, nrow=0, ncol=2)
for(i in 8:81){
outcome <- dplyr::pull(coviddata,7)
analyzed_data <- dplyr::pull(coviddata,i)
currentfactor <- colnames(coviddata)[i]
corvalue <- cor.test(outcome, analyzed_data)$estimate
result <- c(currentfactor, abs(corvalue))
pearsoncor <- rbind(pearsoncor, result)
}
pearsoncor <- pearsoncor[order(pearsoncor[,2], decreasing=TRUE),]
plot(x <- pearsoncor[1:12,2], main="Most important factors predicting outcome", ylab="Pearson Correlation")
text(pearsoncor[1:12,2], labels=pearsoncor[1:12,1], cex=0.7)
p <- ggplot(coviddata, aes(x=coviddata$Neutrophils, color=factor(coviddata$Outcome))) + geom_histogram(binwidth = 1, fill="beige") + xlab("Neutrophils") + ylab("Outcome") + ggtitle("Impact of Neutrophils on Outcome") + scale_color_manual(labels = c("Survived", "Died"), values = c("darkgreen", "red")) + labs(color="Outcome")
plot(p)
p2 <- ggplot(coviddata, aes(x=coviddata$Lymphocyte, color=factor(coviddata$Outcome))) + geom_histogram(binwidth = 1, fill="beige") + xlab("Lymphocyte") + ylab("Outcome") + ggtitle("Impact of Lymphocyte on Outcome") + scale_color_manual(labels = c("Survived", "Died"), values = c("darkgreen", "red")) + labs(color="Outcome")
plot(p2)
p3 <- ggplot(coviddata, aes(x=coviddata$Albumin, color=factor(coviddata$Outcome))) + geom_histogram(binwidth = 1, fill="beige") + xlab("Albumin") + ylab("Outcome") + ggtitle("Impact of Albumin on Outcome") + scale_color_manual(labels = c("Survived", "Died"), values = c("darkgreen", "red")) + labs(color="Outcome")
plot(p3)
p4 <- ggplot(coviddata, aes(x=coviddata$`Prothrombin activity`, color=factor(coviddata$Outcome))) + geom_histogram(binwidth = 1, fill="beige") + xlab("Prothrombin Activity") + ylab("Outcome") + ggtitle("Impact of Prothrombin Activity on Outcome") + scale_color_manual(labels = c("Survived", "Died"), values = c("darkgreen", "red")) + labs(color="Outcome")
plot(p4)
p5 <- ggplot(coviddata, aes(x=coviddata$`High sensitivity C-reactive protein`, color=factor(coviddata$Outcome))) + geom_histogram(binwidth = 1, fill="beige") + xlab("High sensivity C-reactive protein") + ylab("Outcome") + ggtitle("Impact of High sensivity C-reactive protein on Outcome") + scale_color_manual(labels = c("Survived", "Died"), values = c("darkgreen", "red")) + labs(color="Outcome")
plot(p5)
p6 <- ggplot(coviddata, aes(x=coviddata$`D-D dimer`, color=factor(coviddata$Outcome))) + geom_histogram(binwidth = 1, fill="beige") + xlab("D-D dimer") + ylab("Outcome") + ggtitle("Impact of D-D dimer on Outcome") + scale_color_manual(labels = c("Survived", "Died"), values = c("darkgreen", "red")) + labs(color="Outcome")
plot(p6)
p7 <- ggplot(coviddata, aes(x=coviddata$`Lactate dehydrogenase`, color=factor(coviddata$Outcome))) + geom_histogram(binwidth = 1, fill="beige") + xlab("Lactate dehydrogenase") + ylab("Outcome") + ggtitle("Impact of Lactate dehydrogenase on Outcome") + scale_color_manual(labels = c("Survived", "Died"), values = c("darkgreen", "red")) + labs(color="Outcome")
plot(p7)
p8 <- ggplot(coviddata, aes(x=coviddata$`Neutrophils count`, color=factor(coviddata$Outcome))) + geom_histogram(binwidth = 1, fill="beige") + xlab("Neutrophils count") + ylab("Outcome") + ggtitle("Impact of Neutrophils count on Outcome") + scale_color_manual(labels = c("Survived", "Died"), values = c("darkgreen", "red")) + labs(color="Outcome")
plot(p8)
p9 <- ggplot(coviddata, aes(x=coviddata$`Fibrin degradation products`, color=factor(coviddata$Outcome))) + geom_histogram(binwidth = 1, fill="beige") + xlab("Fibrin degradation products") + ylab("Outcome") + ggtitle("Impact of Fibrin degradation products on Outcome") + scale_color_manual(labels = c("Survived", "Died"), values = c("darkgreen", "red")) + labs(color="Outcome")
plot(p9)
coviddata$Gender[coviddata$Gender==1] <- "Male"
coviddata$Gender[coviddata$Gender==2] <- "Female"
#coviddata$Outcome[coviddata$Outcome==0] <- "Survived"
#coviddata$Outcome[coviddata$Outcome==1] <- "Died"
p10 <- ggplot(coviddata, aes(x=Age, y=Gender, color=Outcome)) + geom_point() + scale_color_distiller() + theme_classic() + theme(legend.title = element_blank())
ggplotly(p10)
coviddata.training.indicies <- createDataPartition(coviddata$Outcome, p = 0.80, list = FALSE)
coviddata.training <- coviddata[coviddata.training.indicies,]
coviddata.validation <- coviddata[-coviddata.training.indicies,]
control <- trainControl(method="cv", number=10)
metric <- "Accuracy"
library(MASS)
##
## Dołączanie pakietu: 'MASS'
## Następujący obiekt został zakryty z 'package:plotly':
##
## select
#set.seed(7)
#fit.lda <- train(Outcome~., data=coviddata.training, method="lda", metric=metric, trControl=control )
#set.seed(7)
#fit.cart <- train(Outcome~., data=coviddata.training, method="cart", metric=metric, trControl=control )
#set.seed(7)
#fit.knn <- train(Outcome~., data=coviddata.training, method="knn", metric=metric, trControl=control )
#set.seed(7)
#fit.svm <- train(Outcome~., data=coviddata.training, method="svm", metric=metric, trControl=control )
#set.seed(7)
#fit.rf <- train(Outcome~., data=coviddata.training, method="rf", metric=metric, trControl=control )
The correlation analysis conducted in the report indicates several factors having major influence over predicted patients outcome. The following parameters seems to have the most significance:
The results of data analysis are consistent with the article „An interpretable mortality prediction model for COVID-19 patients”. The proposed estimated outcome algorithm is dependent on three factors:
All three parameters are among the most significant factors obtained during analysis.